# High-precision image captioning
Amoral Gemma3 12B Vision
Vision-enhanced version based on soob3123/amoral-gemma3-12B, combining Gemma3-12B large language model with visual encoder for multimodal tasks
Image-to-Text
Transformers English

A
gghfez
25
2
Pixtral 12b
Apache-2.0
Pixtral is a multimodal model based on the Mistral architecture, capable of processing both image and text inputs to generate detailed textual descriptions.
Image-to-Text
Transformers

P
mistral-community
31.93k
90
Featured Recommended AI Models